The MNIST database (link) has a database of handwritten digits.
The training set has $60,000$ samples. The test set has $10,000$ samples.
The digits are size-normalized and centered in a fixed-size image.
The data page has description on how the data was collected. It also has reports the benchmark of various algorithms on the test dataset.
In [ ]:
import numpy as np
import keras
from keras.datasets import mnist
In [ ]:
# Load the datasets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
In [ ]:
# What is the type of X_train?
In [ ]:
# What is the type of y_train?
In [ ]:
# Find number of observations in training data
In [ ]:
# Find number of observations in test data
In [ ]:
# Display first 2 records of X_train
In [ ]:
# Display the first 10 records of y_train
In [ ]:
# Find the number of observations for each digit in the y_train dataset
In [ ]:
# Find the number of observations for each digit in the y_test dataset
In [ ]:
# What is the dimension of X_train?. What does that mean?
In [ ]:
from matplotlib import pyplot
import matplotlib as mpl
%matplotlib inline
In [ ]:
# Displaying the first training data
In [ ]:
fig = pyplot.figure()
ax = fig.add_subplot(1,1,1)
imgplot = ax.imshow(X_train[0], cmap=mpl.cm.Greys)
imgplot.set_interpolation('nearest')
ax.xaxis.set_ticks_position('top')
ax.yaxis.set_ticks_position('left')
pyplot.show()
In [ ]:
# Let's now display the 11th record
In [ ]: